Mining Flipping Correlations from Large Datasets with Taxonomies
نویسندگان
چکیده
In this paper we introduce a new type of pattern – a flipping correlation pattern. The flipping patterns are obtained from contrasting the correlations between items at different levels of abstraction. They represent surprising correlations, both positive and negative, which are specific for a given abstraction level, and which “flip” from positive to negative and vice versa when items are generalized to a higher level of abstraction. We design an efficient algorithm for finding flipping correlations, the Flipper algorithm, which outperforms näıve pattern mining methods by several orders of magnitude. We apply Flipper to real-life datasets and show that the discovered patterns are non-redundant, surprising and actionable. Flipper finds strong contrasting correlations in itemsets with low-to-medium support, while existing techniques cannot handle the pattern discovery in this frequency range.
منابع مشابه
Twitter data analysis by means of Strong Flipping Generalized Itemsets
Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis of Twitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets sets of items at different abstraction levels can be effectively mined an...
متن کاملTowards a Framework for Semantic Exploration of Frequent Patterns
Mining frequent patterns is an essential task in discovering hidden correlations in datasets. Although frequent patterns unveil valuable information, there are some challenges which limits their usability. First, the number of possible patterns is often very large which hinders their effective exploration. Second, patterns with many items are hard to read and the analyst may be unable to unders...
متن کاملMining Generalised Emerging Patterns
Emerging Patterns (EPs) are a data mining model that is useful as a means of discovering distinctions inherently present amongst a collection of datasets. However, current EP mining algorithms do not handle attributes whose values are asscociated with taxonomies (is-a hierarchies). Current EP mining techniques are restricted to using only the leaf-level attribute-values in a taxonomy. In this p...
متن کاملAutomatic Construction of Taxonomies of Categories
Hierarchies are an intuitive and effective organization paradigm for data. Of late there has been considerable research on automatically learning a hierarchical organizations of data. In this paper, we formulate the problem of “automatically constructing hierarchical taxonomies”, which we define as learning a hierarchy of categories with no user defined parameters. We propose a framework that c...
متن کاملDigging deep into weighted patient data through multiple-level patterns
Large data volumes have been collected by healthcare organizations at an unprecedented rate. Today both physicians and healthcare system managers are very interested in extracting value from such data. Nevertheless, the increasing data complexity and heterogeneity prompts the need for new efficient and effective data mining approaches to analyzing large patient datasets. Generalized association...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 5 شماره
صفحات -
تاریخ انتشار 2011